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Abstract 

We consider the numerical irreducible decomposition of a positive di- 
mensional solution set of a polynomial system into irreducible factors. 
Path tracking techniques computing loops around singularities connect 
points on the same irreducible components. The computation of a lin- 
ear trace for each factor certifies the decomposition. This factorization 
method exhibits a good practical performance on solution sets of relative 
high degrees. 

Using the same concepts of monodromy and linear trace, we present a 
new monodromy breakup algorithm. It shows a better performance than 
the old method which requires construction of permutations of witness 
points in order to break up the solution set. In contrast, the new algo- 
rithm assumes a finer approach allowing us to avoid tracking unnecessary 
homotopy paths. 

As we designed the serial algorithm keeping in mind distributed com- 
puting, an additional advantage is that its parallel version can be easily 
built. Synchronization issues resulted in a performance loss of the straight- 
forward parallel version of the old algorithm. Our parallel implementation 
of the new approach bypasses these issues, therefore, exhibiting a better 
performance, especially on solution sets of larger degree. 
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1 INTRODUCTION 



1.1 Problem Statement 

As polynomial equations emerge more and more often in various fields of science 
and engineering, the question of simplification of polynomials and polynomial 
systems becomes of the most importance. How can we simplify? One way to 
understand better the solution set of a polynomial is to factor it; equivalently, in 
case of a polynomial system we talk about finding an irreducible decomposition 
of its solution set, a central problem in numerical algebraic geometry |34l I35| . 

A widely known family of polynomial systems used for benchmarking is 
"cyclic n-roots" , which arose in Fourier analysis [3[ 0] 
running example: 

Xl + X 2 + X 3 + X 4 
X\X2 + X 2 X 3 + X 3 X 4 + X4X1 
X1X2X3 + X2X3X4 + X3X4X1 + X4X1X2 
X1X2X3X4 — 1 

This system has a one dimensional solution component of degree four, which 
becomes obvious by the substitution X3 — —x\ and x 4 = ~x 2 - After this 
substitution, the first three equation vanish and the last equation simplifies to 
x\x\ — 1. Since x\x\ — 1 = {x\X2 — l)(a;ia;2 + 1), the curve of degree four factors 
in two irreducible quadrics. 

We denote systems of polynomial equations by f (x) = (/i(x), . . . , / m (x)) = 
0, where 6 C[x] = C[xi, . . . ,x n ] for all i. Very often, the coefficients 
are known with limited accuracy. The solution set V to f(x) = is natu- 
rally organized into pure dimensional solution sets V = [Vo, V\, . . . , V n ], where 
dim(Vfc) — k. A numerical representation of a pure dimensional solution set 
is a witness set 20] |35j. which consists of 

f . the polynomials fi (i = 1, . . . , m); 

2. k linear equations L(x) = (L^x), . . . , L^(x)) = with generic coefficients 
describing k generic hyperplane slices; 

3. a list W of deg(Vfe) solutions to the system f(x) = L(x) = 0. 

By the generic choice of the coefficients of the L(x) = 0, the k hyperplanes 
defined by L cut out exactly as many isolated regular solutions on Vk as deg(Vfe). 

Notice how the treatment of positive dimensional solution sets is reduced 
to dealing with collections of generic points. Using slack variables we reduce 
overdetermined polynomial systems to systems with as many variables as un- 
knowns 28 . For a system like cyclic 4- roots in , we add one slack variable z 
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to the system: 



Xl + x 2 + X 3 + X4 + 


b\z 


= 


X\X2 + X 2 X 3 + X 3 X 4 + X4X1 + 


b 2 z 


= 


X1X2X3 + X2X3X4 + x 3 x 4 X\ + x^xiXi + 


b 3 z 


= 


XIX2X3X4 — 1 + 


64 z 


= 


c + cixi + c 2 x 2 + C3X3 + C4X4 


+ z 


= 0, 



(2) 



where the coefficients b\, . . ., 64, cq, c\, . . ., C4 are randomly chosen complex 
numbers. The extra linear equation reduces the dimension of the solution set 
by one and we find generic points on the curve as regular solutions with z = 0. 
For the cyclic 4-roots system the witness set contains four generic points, as the 
degree of the curve equals four. 

The main question now is: given a positive dimensional solution set V, can 
we find its decomposition into the irreducible components? In the language 
of witness sets this interprets as: given a witness set (f , L, W) of V, can we 
find a decomposition W = U . . . U W' r > such that for all i the witness set 
(f , L, W^) represents an irreducible component of VI 

Finding polynomial time algorithms for the factorization of multivariate 
polynomial with approximate coefficients was posed in I20j as one of the chal- 
lenges in symbolic computation. This challenge received a lot of attention 00 
03 |H1 El CH El Eni 12] ES]; see Q f° r a mce description of recent methods. 

1.2 Numerical Homotopies define Loops around Singular- 
ities 

In |3Uj. a new numerical algorithm using homotopy continuation methods was 
proposed to decompose a positive dimensional solution set into irreducible fac- 
tors. Linear traces were proposed in |31| to certify a numerical irreducible 
decomposition. The implementation j.i2j was adjusted to the important special 
case of factoring one single multivariate complex polynomial in |33j . 

In this section we outline the idea of exploiting monodromy using homotopies 
to define loops around singularities. Assume two witness sets (f, Li,Wi) and 
(f, Li2,W2) represent the same positive dimensional irreducible component V. 
Consider the system H 7i l!,l 2 ( x , t) : 

{(i-oiaW+TO-w - (<e[M) (3) 

where 7 is a generic nonzero complex number. Then, due to the generic choice 
of 7, for a fixed value of t the solutions to H 7i l 1 ,l 2 ( x )*) are au isolated. In 
particular, these are W% for t — and W% for t = 1. 

Tracking solutions of H 7i L li L 2 (x,t) as t varies from to 1 defines a 1-to-l 
map </> 7 .Li,l 2 : Wi — > W2- If the composition of two such maps defines a loop 
around a singularity for some t, then a permutation of the points of a witness 
set is obtained, in particular: 

7r 7 ,L 1 X 2 = 073,112,1,! °0 7l ,li,l2 (4) 
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is a permutation of W\. All permutations that arise in this fashion form a 
subgroup of the symmetry group acting on W\ and the orbits of this action are 
witness sets that represent irreducible components. 

The idea to exploit monodromy first appeared in a theoretical complexity 
study [2]- Although our approach does not need to know the precise location 
of the singularities, one could as in JT] compute those for algebraic curves, see 
also the command algcurves [monodromy] in Maple. 

The algorithm in |3U| collects points connected by loops into the same witness 
sets which converge to numerical representations of the irreducible components. 
In a stop criterion for this algorithm was presented, using the linear trace. 
We explain this trace test on a system like cyclic 4-roots. Note our program 
only works with generic points obtained as solutions of 10) and does not have 
a symbolic polynomial of degree four. Via a generic projection we map the 
points in 4-space down to the plane. If two of the four points to belong to the 
same irreducible component, there must exist a quadratic polynomial p{x,y) 
vanishing at those two points and at any point of the quadratic irreducible 
factor. The linear trace is then defined rewriting p(x, y) as p(x, y(x)): 



where t\(x) is the linear trace. If t\ was not linear, then deg(p) > 2. So 
t\(x) = ax + b, for some a and b to be determined by interpolation at x = xq 
and x — x%, with corresponding y-values in (xo,yoi), (^o, 2/02), (%i,yii), and 
(0:1,2/12). If for an additional sample, at x — X2 with corresponding y- values 
( x 2,V2i), (^2,2/22), we have ti(x 2 ) = 2/21 + J/22, then we have an irreducible 
quadratic factor, otherwise the two points do not lie on the same factor. 

The linear trace test, called zero-sum relations, was first introduced in [57] 
and further developed in [551 . For factors of small to moderate degree, the 
linear trace test can be applied in an exhaustive combinatorial enumeration as 
was proposed in ^] [221 , [21] and improved in [H] . 

1.3 Parallel Algorithms 

Homotopy continuation methods are very well suited for parallel processing as 
after distributing the path tracking jobs among the computers in the network, no 
further communications are needed, see [Tll51ll7j for granularity issues. For com- 
putational algebraic geometry, this implies that homotopy methods can solve 
much larger polynomial systems than methods in computer algebra which are 
harder to adapt to parallel computers |21|. One recent example is the solution 
of the cyclic 13-roots problems with PHoM for which 2,704,156 paths 

were tracked. 

Modern homotopies in numerical algebraic geometry often appear in a se- 
quence like the Pieri homotopies [57j where the start solutions of one homotopy 
lie at the end of paths defined by another homotopy. The homotopies to factor 



p(x,y(x)) 



(y - yi(x))(y - V2{x)) 

y 2 - (yi(x) + V2{x))y + y 1 {x)y 2 {x) 
y 2 -t 1 (x)y + t 2 (x), 



(5) 
(G) 
(7) 
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positive dimensional solution sets raise job scheduling issues as the decision to 
certain track paths depends on the outcome of other paths. This paper can be 
regarded as a solution to the job scheduling problems raised in |22| . 

The parallel algorithm proposed in [22] exhibited a good speedup in the path 
tracking jobs, but the certification with linear traces executed in only by the 
master node before the scheduling of new path tracking jobs diminished the 
overall performance as all nodes were idling waiting for the assignment of new 
path tracking jobs. At the end of |221 we outlined a probabilistic complexity 
study simulated in Maple, suggesting various job scheduling techniques. In this 
paper we report on its parallel implementation confirming the efficiency of the 
approach. 



2 USING MONODROMY MORE EFFICIENTLY 

The monodromy breakup algorithm of |3U1 l3"T] is sketched by Algorithm 12.11 
On input is a witness set Wl and on output a partition of Wl, corresponding 
to the irreducible decomposition. 

Algorithm 2.1 Monodromy Breakup certified by Linear Trace: V = Brcakup(T / l / L, d, N) 
Input: W Ll d, N. 
Output: V. 



0. initialize V with d singletons; 

1. generate two slices L' and L" parallel to L; 

2. track d paths for witness set with L'; 

3. track d paths for witness set with L"; 

4. for k from 1 to A do 

4.1 generate new slices K and a random 7; 

4.2 track d paths to a new slice; 

4.3 generate a new random 7; 

4.4 track d paths to return to the base slice; 

4.5 compute the permutation and update V] 

4.6 if linear trace test certifies V 
then leave the loop; 

end if; 
end for. 

Our first parallel implementation of this algorithm, described in [22], uses a 
master /servant model where the master node distributes the paths among the 
available processors in the network. According to our experimental results, a 
sizeable speedup is achieved by distributing the routine path-tracking jobs to 
different nodes. However, a probabilistic study in |221 suggested that we can 
save some work by taking a smaller one-path-one-point tracking job as an atomic 
task. 

Following the previous discussion, we take two generic slices Li and L2 (in 
this case these are hyperplanes) and look at the witness points W\ and Wi . 
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Consider the bipartite graph with vertices W\ on one side and W2 on the other. 

One atomic step of the monodromy breakup algorithm consists of creating 
a map like </> 7 .Li,l 2 - We can visualize such a map by connecting the points of 
W\ and W2 that map into each other with an edge. 

? 

1 1 • first quadric 

? *? 

■ 1 O second quadric 

? 4 

« > ^ map 7 ,L!,l 2 

9 

Figure 1: The bipartite graph W\ <-> W2 for cyclic4 

As you may see in our example, in order to create one permutation we need 
to construct 8 edges in the graph. If one is lucky then it may take [MBA-P] 
only one permutation to decompose cyclic4: 




Figure 2: Permutation (12) (34) for cyclic4 

The connected components of the graph in Figure |2 correspond to the two 
witness sets that, in turn, represent two irreducible components of the solution 
set of cyclic4: two quadric curves. 

In fact 2 out of 8 edges in Figure |21 can be removed keeping the connected 
components still connected, see Figure [21 Since each edge can be created by 
tracking only one point of a witness set, we may avoid doing extra work by 
trying to create as few edges as possible. 
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3 O. 




4 d> 



W 2 



Figure 3: A 6-edge graph for cyclic4 



3 A NEW ALGORITHM 



In this section we first describe the sequential version of our new monodromy 
breakup algorithm before addressing its parallel execution. 

3.1 Serial Version 

The flow chart of our new algorithm is show in Figure 0] As in Algorithm 12.11 
we also have the initialization of the "trace grid" , which are the two witness 
sets on two parallel slices needed to certify the irreducible decomposition using 
linear traces. While Algorithm 12 . II first completes all the loops for all points in 
a witness set before proceeding to the next level, our new approach initializes s 
new witness sets which are available for generating loops. 

The main loop of the new algorithm shown in Figure 01 leaves much freedom 
to complete loops between any two slices. For every slice, the algorithm keeps 
track of the number of loops that did not yielded a permutation, stored in N re j. 
Based on these statistics, the algorithm can discriminate against slices which 
were not productive in the past and select those slices which led to more new 
permutations. 

As explained in the previous section, the algorithm typically returns with a 
certified decomposition before all loops are completed. This is the main reason 
why on single processors, our new algorithm outperforms Algorithm l2.ll At the 
same time, the new algorithm is more suitable for parallel execution, as we will 
explain next. 

3.2 Parallel Version 

The parallel version of our new algorithm runs in a master/servant model. The 
initialization phase is very similar to the initialization of the parallel version of 
Algorithm 12.11 with the master distributing path tracking jobs evenly among 
all nodes. After the initialization, the master node keeps looking for available 
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path tracking nodes to assign paths and the other nodes are either busy tracking 
paths or ready to start new path tracking jobs. 

Compared to our previous parallel algorithm described in |22| , the computa- 
tion of the linear trace by the master node is now interleaved by path tracking 
jobs performed on the other nodes. 

4 EXPERIMENTAL RESULTS 

Our algorithms are implemented using the path tracking routines in PHCpack |36| . 
extended in |.'{2j with facilities for a numerical irreducible decomposition. As 
in [37], we apply MPI for message passing. Our main program is written in C 
and links with the interface of PHCpack. 

Our equipment consists of two personal cluster machines purchased from 
Rocketcalc (www.rocketcalc.com) for a total of 12 2.4Ghz CPUs, served by 
a Dell workstation with two dual 2.4Ghz processors. So in total we have 14 
processors at our disposal. 

4.1 Plain Parallel Path Tracking 

In Table ^ we show with three runs the main defect of our first parallel imple- 
mentation presented in |22j . While we have no real control over the number of 
loops it takes to complete the factorization, we observe from the data in Tabled 
that as the total number of loops increases, the work done by the master node 
increases. 



#loops 


4 


6 


9 


min track 


8.0 sec 


10.9 sec 


18.5 sec 


max track 


10.8 sec 


15.7 sec 


21.8 sec 


master 


1.8 sec 


3.8 sec 


7.6 sec 


total 


12.6 sec 


19.5 sec 


29.4 sec 



Table 1: Three runs with the first parallel monodromy breakup algorithm, exe- 
cuting respectively 4, 6, and 9 loops to factor a curve of degree 144 in 8 space on 
14 processors. We report the minimal and maximal time the nodes spent track- 
ing paths, and the time spent by the master node certifying the decomposition 
and scheduling the jobs. 

Although the cyclic 8-roots system is a problem of modest size, we already 
observe in Table that for 9 loops, more than 25% of the time is spent by the 
master node, while all the other nodes are idling. For larger problems and more 
processors, the poor performance of this first implementation will become even 
more apparent. 
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4.2 Performance of the new Algorithm 

Tabic |2 reports on five runs done on 14 processors to decompose the curve of 
degree 144 denned by the cyclic 8-roots system. For three of the five runs we 
used three new slices. In the last two runs we see that the total time decreases 
if we use only two new slices. 





3 new slices 


2 new slices 


^runs 


1 


2 


3 


4 


5 


initial 


8.73 


9.01 


8.89 


6.54 


6.98 


master 


6.06 


6.22 


6.18 


6.67 


7.10 


min track 


5.96 


6.16 


6.07 


6.60 


7.02 


max track 


6.06 


6.24 


6.23 


6.11 


7.15 


total 


14.9 


15.4 


15.3 


13.4 


14.2 



Table 2: Five runs with our new parallel monodromy breakup algorithm, three 
times with 3 new slices and two times with 2 new slices. We report the time used 
for initialization, the time spent by the master node, the minimal and maximal 
time for the nodes spent tracking paths, and the total time. All reported times 
are expressed in seconds. 

In TableElwe see an even distribution of the time spent by the nodes. Using 
fewer slices reduces the initialization time at the expense of a slightly higher 
running time in the main loop. 

Comparing to the timings in Table^we do not notice such a wide fluctuation 
in the total execution time between different numbers of loops. The total exe- 
cution time of the most favorable situation reported in Tabled is only slightly 
lower than the best total time in Tabic [3 

Finally, we report on a calculation of a larger example, the ideal of adjacent 
2-by-2 minors of a general 2-by-9 matrix of 18 unknowns, see an d EH- This 
system in 18 variables defines a 10-dimcnsional surface of degree 256 which 
factors in 34 irreducible components. The total execution time of our new 
monodromy breakup algorithm on 14 processors is 97.1 seconds, of which 62.7 
are spent on the initialization, 33.8 seconds by the master node in the main loop 
while the time path tracking on the other noeds fluctuated between 32.9 and 
35.6 seconds. 

The performance of our first parallel algorithm on this system is even more 
erratic. The very best complete run of 3 monodromy loops took 122.9 seconds 
on 14 processors, where the path tracking time ranged between 75.9 and 104.4 
seconds. Even on this very best run, our new algorithm still takes only 80% of 
the time spent by the first parallel monodromy breakup algorithm of [22J . 
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5 CONCLUSIONS 



In this paper we report on the development and performance of a new mon- 
odromy breakup algorithm. Experimental results show a more predictable and 
regular performance than our first parallel implementation of |22| . Thanks to 
this increased performance, it is now possible to factor solution sets of larger 
degrees. 
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Initialization: 

P := {{&} | 1 < a < d, {a} is not a component} 
Q := {{ fl l I 1 — a — {a} is a component} 

JVtot = N red = 
Construct s witness sets using random slices 
Label the witness points on the slices 
Construct the trace grid 




Pick the smallest part p E P 
and a label a e p 



Pick two slices L\ and i 2 



b := track(a 1 L\, L<i) 
Ntot ■= N tot + 1 
Find q £ P that contains the label b 




N rej • — N re j -\- 1 



Return Q, N tot , N rej 



Merge p and q: 
P:=P\{p,q} 
P := P U {p U q} 




P:=P\{pUq} 
Q := Q U {p U q} 



Figure 4: The flow chart of our new Monodromy Breakup Algorithm 
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